2, AZ 


IN 47907 


_—_ 
tage has been 
iona! mailing 
ea to The Mod- 
ersity of Wis- 
n, WI 53715. 
subject matter 


A Critical Analysis 


JAMES P. LANTOLF and WILLIAM FRAWLEY 


Oral-Proficiency Testing: 


i aaa 


PERHAPS THE MOST DOMINANT MOVEMENT TO 
surface within the language teaching profession 
over the last five years is the attempt by the 
American Council on the Teaching of Foreign 
Languages and the Educational Testing Service 
(henceforth, ACTFL/ETS) to establish and im- 
plement second-language proficiency guidelines 
for testing and for organizing the language 
teaching curriculum. The ACTFL/ETS Guide- 
dines are based in large part on the Foreign Ser- 
vice Institute (FSI) oral-proficiency test that has 
been in use in the federal government for nearly 
thirty years. While a history of the FSI and 
ACTFL/ETS Guidelines is an interesting study 
in itself, our concern in this paper is with what 
we perceive to be serious, if not insurmount- 
able, problems with the ACTFL/ETS Guide- 
lines.! Although we perceive difficulties with the 
Guidelines established for all four language skills 
and cultural performance, we will focus the 
present discussion on the oral-proficiency Guide- 
lines Only. 

To date, only minimal skepticism has been 
expressed toward oral-proficiency testing 
(OPT) as defined by ACTFL/ETS. Savignon, 
however, at least cautions that perhaps the pro- 
fession is moving too quickly toward implemen- 
tation of the ACTFL/ETS Guidelines when she 
points out that they do not give sufficient con- 
sideration to research in communicative com- 
petence.? Although we concur with Savignon’s 
gencral position, we believe, as will be argued 
below, that her criticism of OPT, as reflected 
in the Guidelines, does not go far enough. 


ORAL-PROFICIENCY TESTING: SOME 
INCONSISTENCIES 

We begin our discussion with an examina- 
tion of what, in our view, are serious inconsis- 


—————— 
The Modern Journal, 69, iv (1985) 
0026-7902/85/0004/337 $1.50/0 

©1985 The Modern Language Journal 


tencies relating to three assumptions of OPT: 
1) the number of hours required to produce an 
S-2 level speaker; 2) the relation between topic 
of discourse and accuracy; and 3) errors in 
communication. 

One of the observations found in the litera- 
ture on OPT is that a relationship may exist 
between the number of hours of language study 
and a proficiency rating. While we doubt 
whether such a relationship necessarily holds 
in the case of any specific individual Jearner, 
we note a sharp discrepancy between the claims 
made by Richard Brod, and cited in a number 
of recent papers on OPT, and the findings re- 
ported by Lambert with regard to the number 
of hours required to produce a level two 
speaker.? Brod (p. 10) contends that, under 
optimal conditions of reduced class-size with in- 
tensive instruction, highly motivated learners 
can attain an S-2 rating in a western European 
language in approximately 480 hours. Schulz 
(p. 3) interprets this figure in terms of the tra- 
ditional college curriculum and concludes that 
it means eight four-credit semesters of language 
study “with a predominant and intensive focus 
on language skills development” —an ambitious, 
although not unattainable goal. Lambert and 
colleagues (p. 72), on the other hand, report 
that government language schools require a 
minimum of B40 hours of intensive training to 
bring learners to an S-2 level in what they de- 
scribe as “easy languages” (e.g., Spanish, 
French, and Italian). Translated into univer- 
sity semesters (i.c., five days per week for one 
hour per day), this schedule means a minimum 
of slightly more than eleven semesters of study. 
Thus, the concern expressed by some authors 
regarding instructional programs that might 
produce terminal 2/2 + speakers is at best pre- 
mature.4 The language-teaching profession had 
better consider carefully the disparate findings 
presented in the Brod paper and the Lambert 
report before concerning itself with reorgani- 
zation of the language curriculum or with the 


meye r 
ronan tan bad as 


Oral-Proficiency Testing 


norm-referenced testing, and multi-trait ma- 
qrices.'? The solution, in our opinion, rests on 
consideration of the much deeper philosophical 
issues which form the basis of the ACTFL/ETS 
Guidelines. Unless these issues are resolved, 
attempts to develop criterion-referenced rather 
than norm-referenced tests and research on the 
construct and content validity of such tests are, 
at best, limited, and at worst, irrelevant. We 
believe that profound problems in OPT arise 
from an analytic approach to testing and the 
native-speaker yardstick. In what follows, we 
present detailed objections to both of these 
points as they are reflected in the ACTFL/ETS 
Guidelines and discussed by other scholars. !3 

Analytic Testing. One of the assumptions be- 
hind the ACTFL/ETS Guidelines is that profi- 
ciency can be schematized by two pyramids: 
one, from base to point, reflecting the reality 
that many more novice speakers (base) exist 
than superior-level speakers (point) in L2; and 
the other, the inverse of the previous schema, 
reflecting that being a novice (inverted point) 
is easy and advancement to superior-level (in- 
verted base) becomes progressively more diffi- 
cult. First of all, these diagrams do not state 
empirical truths; they state analytic truths: 
things that are true by definition only.* The 
claim that more novice speakers of L2 exist 
than superior-level speakers is a truism; no one 
would think of setting out to prove it. Second, 
each diagram is taken in support of the other: 
the reason that we have more novices than 
superior-level speakers is that it is more diffi- 
cult to be a superior-level speaker; the reason 
that it is easier to move from novice to inter- 
mediate than from intermediate to superior is 
that more people do the former than the latter. 
Such circularity is a consequence of the ana- 
lytic nature of the claims, as all circularity is 
analytic. One would be hard-pressed to prove 
some empirical isomorphism between the two 
models. 

The above assumptions regarding levels and 
ease belie a distinct tendency in OPT to deduce 
necessary facts a priori and then to prove them, 
which is an elaborate form of question-begging. 
The real question is not whether it is easier to 
pass from novice to intermediate than from in- 
termediate to superior, but whether any novice, 
intermediate, or superior speakers exist at all. 
The ACTFL/ETS Guidelines are complicated 
ways of proving, propagating, and imposing 


339 


analytic truths with reference to the model, and 
then masking these claims as empirical truths 
which reify pseudo-observations as fact. 

Liskin-Gasparro writes”. . . the proficiency 
guidelines and tests built to measure proficiency 
are by definition criterion-referenced."!* What 
does this mean? A criterion-referenced test is 
one that gives an absolute scale and measures 
individual performance with reference to the 
scale, not with reference to a norm or average, 
such as a driving test, where driving compe- 
tence is defined by certain tasks, and where 
people are judged against those criteria, not 
against the performance of others on the test. 
The point here is that the criteria themselves 
define competence absolutely, and this compe- 
tence may have nothing to do with real-world 
performance, as in the case of driving tests: no 
one would seriously claim that discrete per- 
formances on backing up, signaling, and turn- 
ing constitute real driving competence. Simi- 
larly, “measures of communicative performance 
must not be taken as an indication of some 
absolute amount of success an individual has 
in communicating.™ We cannot ignore the fact 
that in normal verbal interaction the factor of 
“maximal interlocutor co-operation” comes into 
play.!” This point is especially significant in 
light of what we know about the nature of dia- 
logic speech.!* 

Why then should criterion-referenced tests 
like those proposed by ACTFL/ETS be taken 
as a measure of anything real? Criterion-ref- 
erenced tests impose competencies on the exam- 
inees and measure the extent to which the per- 
son deals with the imposition. Other researchers 
have recently made similar claims. Spolsky 
(p. 43) comments that despite the importance 
of maximal interlocutor cooperation in real-world 
communication, OPT “favors autonomous ver- 
balization that idealizes the communication to 
relative strangers of the maximum amount of 
new knowledge using only verbal means.” 
Troike contends that OPT is in itself a social 
event “constituted and constructed by the par- 
ticipants in the event."!9 It is therefore difficult 
to see how performance on such a test might 
somehow reflect absolute linguistic proficiency. 
In fact, according to Wald, “no matter how lan- 
guage proficiency is subcategorized in theory, 
in practice the actual assessment of all its com- 
ponents are [sic] closely associated with the 
measurement of language performance in test 


Oral- Proficiency Testing 
no advantage in construing the Guidelines as en- 
tailments. If X entails Y (level implies criteria), 
then Y does not necessarily entail X (criteria 
do not necessarily imply level). This failure of 
reversibility can be seen readily in lexical defi- 
nitions: “fly” entails “leave the ground” but 
“jeave the ground” does not entail “fly” since it 
could entail “jump” or “leap.” Thus, using a dis- 
claimer which essentially says that the Guide- 
lines are entailments leaves the levels undefined. 
Since oral-proficiency researchers are primarily 
interested in the levels, so that speakers can be 
pigcon-holed, they have defined nothing, which 
is tantamount to question-begging. The ques- 
tion is: what are the levels? Answering with an 
entailment does not indicate what the levels are. 
There can also be false entailments.2° An 
ACTFLETS level can falsely entail its criteria, 
which leads to the second disclaimer. 
Omaggio states that the ACTFL/ETS Guide- 
dines: 

. are experientially, rather than theoretically, 
based; that is, they deseribe [italics added} the way 
language learners and acquirers typically function 
along the whole range of possible levels of compe- 
tence, rather than preserite [italics added] the way any 
given theorist thinks [italics added] learners ought to 
function. Because the descriptions represent actual 
rather than hypothetical language production, we 
can amend our expectations for learners’ linguistic 
and communicative development to conform to 
reality. . . . Knowing what competencies lie at the 
next level helps us sequence materials that conform 
to natural development patterns among adult second- 
language learners and choose activities that enable 
them to make progress toward the goals identified 
at the beginning of instruction.*! 


Omaggio is arguing for true entailment of the 
criteria on the basis of their reality.” Apart 
from the fact that this assumption reverses the 
entailment (we are now going from Y to X on 
the basis of the empirical truth of Y, which then 
remakes the implication into a symmetric one 
and contradicts the program by reestabjishing 
absolute definitions), we should consider 
whether or not the Guidelines are experientially 
based. We know of no empirical evidence for 
the gradation of linguistic criteria. In their defi- 
nition of linguistic criteria for the task variable, 
Higgs and Clifford claim that the resolution of 
problems is a less advanced task than persuasion. 
Likewise, a level five is characterized by use 
of professional lexicon, which supposedly is 


341 


somehow more advanced than use of everyday 
lexicon. There is, however, no evidence to sup- 


-port either assertion. In fact, for lexical use, the 


opposite is true. A speciatized lexicon is well- 
circumscribed, and to engage in professional 
talk, one needs to know and use very few verbs 
and conjunctions, and a small set of nouns.” 
Why, then, should the speech of an educated 
native speaker in a professional context be more 
advanced than in an everyday setting?™ Even 
the FLOPA manual recognizes this problem: “In 
universities there are American Spanish pro- 
fessors who can speak eloquently about Calde- 
ron’s dramas, but who could never read a Span- 
ish contract or haggle over the terms of a lease 
of kitchen privileges for students who might 
choose to live with native families” (1-3). 

The linguistic criteria are claimed to be real, 
but we have no empirical evidence to support the 
claim. A cursory look at the empirical litera- 
ture shows that the criteria are either not real 
or not graded. Clark argues that a major theo- 
retical problem for OPT is to develop “suitable 
criterion measures of speaking proficiency in 
actual communicative situations . . . against 
which the common measure test could be ap- 
propriately validated." He further points out 
(p. 20) that attempts to set up communicative 
situations in the direct testing format to reflect 
real-life contexts even minimally have not met 
with much success. In this regard, Jones reports 
on a study designed to engage native speakers 
of German in so-called real-life situations (an 
elicitation technique used from 2+ to 5 
levels).2* He discovered that natives react un- 
naturally to such situations, providing only 
short simple responses compared to the elabo- 
rate explanations volunteered by the nonnative 
participants in the experiment. This finding is 
not at all surprising given Wald’s distinction 
between test language and spontaneous lan- 
guage discussed above. 

The divorce by Higgs and Clifford of syn- 
tax and lexicon in evaluation is equally ques- 
tionable, Recent theoretical advances on the 
lexical basis of transformations show that syn- 
tax and lexicon are intimately connected.’” 
Omaggio’s claim regarding the reality of the 
proficiency criteria is clearly erroneous: the 
Guidelines are, in fact, the constructions of 
theorists and they prescribe what a speaker 
ought to be able to do. 

The preceding discussion aside, however, the 


Oral-Proficiency Testing 

that the Guidelines are not norm-referenced: *3 
The Sacred Native Speaker. Since criterion-refer- 
enced tests do not judge performance on the 
basis of a population but only on testee per- 
formance in relation to the criteria, criterion- 
referenced tests are not norm-referenced. Sup- 
posedly, the ACTFL/ETS Guidelines are not 
norm-referenced, since performance is in 
theory, at least, judged only in relation to the 
(reductive and analytic) criteria, The absence 
of norm-referencing is taken to be a distinct 
advantage in OPT, because it is thought that 
each speaker can be judged individually with 
reference to the levels as they are absolutely de- 
fined. But a deeper look reveals that such tests 
are implicitly norm-referenced. Let us recon- 
sider the quote from Omaggio (see p. 341 
above), who states that the Guidelines are de- 
rived from what learners and acquirers typi- 
cally do. This statement reveals that the Guide- 
Hnes are derived from an implicit notion of the 
mean linguistic behavior of an ideal speaker, 
which sounds curiously like implicit, if not ex- 
plicit, norm-referencing. 

Let us also reconsider the driving-test 
example discussed above. Such a test is not 
norm-referenced because every driver does all 
of the activities listed as the criteria for driving 
proficiency, so there is no need to make a judg- 
ment in relation to the performance of other 
testees: the abilities are either there or not. Is 
the same true of the ACTFL/ETS Guidelines? We 
think not. 

The Guidelines are based on the explicit claim 
that all L2 speakers must be measured ulti- 
mately in relation to the educated native 
speaker, who is thus taken as the norm against 
which all L2 speaker performance is judged. 
This criterion would not be detrimental, how- 
ever, provided that we knew what a native 
speaker was: if we could say, as in a driving 
test, that all native speakers must do the things 
that the Guidelines specify. That is not the case, 
however. 

The Guidelines are not concerned with native 
speakers; they are concerned with the native 
speaker. A recent volume of papers directed 
toward the question of the native speaker con- 
vincingly demonstrates that the native speaker 

L does not exist.44 The arguments are detailed, 
but most are variations of Ballmer's claim that 
the native speaker does not exist as such; only 
types of native speakers do.* He argues (p. 64) 


343 


that four general classes of native speakers 
exist; idiolectal (informants), statistical (typical 
speakers), normative (expert and perfect speak- 
ers), and forme (speakers from historical written 
records). Clearly, oral-proficiency tests which 
take the educated native speaker as the summum 
bonum rely on a yardstick that is a melding of 
a statistical with a normative native speaker. 
An educated speaker able to converse about 
professional topics is an expert speaker, a type 
of normative native speaker, a speaker who is 
indistinguishable from an educated native 
speaker and who makes no mistakes is a per- 
fect speaker, another type of normative speaker. 
The Guidelines derive from typical learners and 
acquirers, and a typical native speaker is a sta- 
tistical abstraction based on a mean or norm. 
Thus, the epitome of OPT, the ultimate yard- 
stick of the native speaker as typical or expert, 
is norm-based. The ACTFL/ETS Guidelines 
judge L2 performance implicitly against a 
statistically and normatively derived entity. 
Coulmas contends that the maximum a 
second language learner can become is a fac- 
simile of a native speaker, a speaker who reacts 
as if he were a native speaker by not attracting 
too much attention to himself.*¢ To achieve 
this, however, requires a radical personality 
shift on the part of the learner. Since the 
ACTFL/ETS Guidelines are not concerned with 
socio-psychological variables, L2 speakers can- 
not even be properly judged as facsimiles of 
native speakers —a notion, as we have argued, 
that is at best an abstraction. Van der Geest 
asserts that the native speaker yardstick ought 
to be rejected because the notion is too re- 
stricted, disallows individual variation, is un- 
reliable, and is generally disappointing in its 
application." Thus, if we know anything, we 
know that the native speaker is not a theoreti- 
cally interesting construct, since the construct 
is neither unitary nor reliable. Yet the ACTFL/ 
ETS Guidelines take the native speaker as a 
given and judge L2 speaker performance 
against it, in particular against a statistically 
and normatively derived abstraction: the “typi- 
cal” learner and the educated native speaker.** 
Based on the arguments we have made in 
this paper, we conclude that OPT, as mani- 
fested in the ACTFL/ETS Guidelines, is analytic 
and not criterion referenced. Tests of this type 
measure the performance of the second-lan- 
guage speaker in relation to an analytically de- 


Sear arn ery see ates oe YA NG OE eRe YI LIE OE EP 


Oral-Proficiency Testing 


s$ymmetne implication is equivalent 10 equality, or 
murua) substitutability, when X implies Y and Y implies 
X For example, a policeman is an officer of the law, and 
an officer of the law is a policeman; “policeman” and “officer 
of the law“ are thus equal and mutually substitutable 

23Liskin-Gasparro (note 13 above), p. 481. 

tLiskin-Gasparro (note 13 above); Omaggio (note 13 
above). 

‘Nothing can be both A and not A 

26Ax in Higgs & Clifford's (note 4 above) discussion of 
terminal 2 profiles 

27 That i, the uth of the Guidelines is determined in terms 
of the Guidelines themselves 

1A closed system is a system which allows nothing 
exiernal to the sysiem and whose structures can be pre- 
dicted from the combination of the initially given structures 
For an in-depth discussion of closed and open systems as 
they relate to the proficiency construct, see James P.Lantolf 
& William Frawley, “Proficiency: How to Make » Native 
Speaker,” presented at the conference on New Develop- 
ments in Language Teaching sponsored by the Foreign 
Service Instinute, 7 May 1985. 

*Entailment is true or false implication, If X is true, 
then Y is true; if X is false, then Y is false. Thus, if the 
level is true, then the criteria are true 

SFly” falsely entails “stay on the ground” 

Omaggio (note 4 above), p. 44 

22Despite Omaggio’s claim that the ACTFL/ETS levels 
correspond to natural developmental sequences in adult L2 
learners, she later asseris in the same paper that we do not 
know "how adults acquire and/or learn second languages 
in either formal or informal settings,” Omaggio (note 4 
above), p. 46. 

“William Frawley & Raut N. Smith, “Patterns of Coher- 
ence in Genre-Specific Discourse,” Humans and Computers 
The Interface through Language, ed. Stephanie Williams (Nor- 
ward, N}: Ablex, in press), Raul N- ‘Smith & Wiliam 
Frawley, “Conjunctive Cohesion in Four English Genres,” 

Text, 3-4 (1983), pp. 347-73. 
One of the authors (Frawley) can talk about linguistics 
in Russian, but cannot order a meal very well in this lan- 
guage: thus, he is both a level 5 and a novice in Russian 


345 


Clark (note 11 above), p. 19 

Randall L. Jones, “Interview Techniques and Scoring 
Criteria at the Higher Proficiency Levels,” Direct Testing of 
Speaking Proficien: Theory and Application, ed. John L. D 
Clark (Princeton Educational Testing Service. 1978), 
p. 96 

gor A Melcuk, Dictionnaire explicatyf et combinatoire du 
franca contemporain (Montreal Univ. of Montreal Press, 
1984); Joan Bresnan, “A Realistic Transformanonal 
Grammar.” Linguistic Theory and Psychologwal Realy, ed 
Morris Halle (Cambridge: MIT Press, 1978) 

38Liskin-Gasparto (note 13 above), p 482 

1Omaggio (note 13 above); Liskin-Gasparro (note 13 
above). 

Omaggio (note 13 above), p 330 

“Omaggio (note 4 above), p- +4 

“Kieran Egan, Eduration and Psychology Plato, Piaget and 
Scimtifie Piychology (New York: Teachers College Press, 
1983), p 153 

“See Liskin-Gasparro (nate 13 above), p 488. 

WA Fastichnfi for Native Speoker, ed. Florian Coulmas (The 
Hague: Mouton, 1981) 

Th. T. Ballmer, °A Typology of Native Speakers,” A 
Feuschnfi for Native Speaker (note 44 above) 

*Florian Coulmas, "Spies and Native Speakers,” A Fet- 
schnfi for Nanne Speaker (note +4 above), p. 358 

47Tdn van der Geest, "How to Become a Native Speaker: 
One Simple Way,” A Festichryft for Native Speaker (note 44 
above), p. 349. 

IAs we have argued elsewhere, using the native speaker 
as the ultimate yardstick of second language performance 
reflects the general and erroneous assumption prevalent in 
much of the second language research literature that wej 
cannot understand learner performance and the learnin, 
process without comparing these phenomena to basclin 
data drawn from native speakers. See William Frawley 
James P Lantolf, “Speaking and Self-Order: A Crivqu 
‘af Orthodox L? Research,” Studies in Second Languagr Aequi- 
sition, 6 (1984). pp. 143-59. 


